Let’s first load the diamonds dataset.
data("diamonds")
head(diamonds)
## # A tibble: 6 × 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
Bar Plot: Great for categorical data. Scatter Plot: Perfect for two numerical variables. Histogram: Ideal for showing the distribution of one numeric variable.
Practice 1: Bar Plot (Categorical Data) Let’s start simple. A bar plot is great for comparing categories. Create a ggplot bar chart comparing the counts of diamonds by their cut quality.
# Create a ggplot bar chart
bar_plot <- ggplot(diamonds, aes(x = cut)) +
geom_bar() +
ggtitle("Diamonds by Cut Quality") +
theme_minimal()
bar_plot
Question: What do you notice about the distribution of cuts? Seems like more people go for ‘Ideal’ cuts!
Now, let’s add some bling by converting this into a plotly plot.
# Convert to interactive plot
ggplotly(bar_plot)
Look! You can now hover over the bars and see exactly how many diamonds there are in each category. Much shinier!
Practice 2: Scatter Plot (Two Numeric Variables) Scatter plots are perfect for showing relationships between two numerical variables. Let’s check out the relationship between carat (weight) and price. Is bigger always better?
# Create a ggplot scatter plot
scatter_plot <- ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(alpha = 0.5, color = "blue") +
ggtitle("Carat vs Price") +
theme_minimal()
scatter_plot
Now, convert this to a Plotly plot so you can zoom in and get up close with those high-priced diamonds!
# Convert to interactive plot
ggplotly(scatter_plot)
Question: What can you infer from this scatter plot? Does it seem like bigger diamonds (higher carats) cost more? But notice the steep jump in prices for certain diamonds.
Practice 3: Histogram (Distribution of Numeric Data) Histograms help you see the distribution of a single numeric variable. Let’s check out the distribution of diamond prices. Ready to be shocked?
# Create a ggplot histogram
histogram <- ggplot(diamonds, aes(x = price)) +
geom_histogram(binwidth = 1000, fill = "green", color = "black") +
ggtitle("Distribution of Diamond Prices") +
theme_minimal()
histogram
Make it interactive to explore those outlier prices!
# Convert to interactive plot
ggplotly(histogram)
Challenge: Try adjusting the binwidth in the histogram and see how it changes the shape of the distribution. What happens when you make the binwidth smaller or larger?
Bonus Practice: Add More Bling (Customization) The cool thing about Plotly is that you can keep customizing. Let’s spice up our scatter plot by adding color based on the diamond’s clarity.
# Create a customized ggplot scatter plot
scatter_plot_colored <- ggplot(diamonds, aes(x = carat, y = price, color = clarity)) +
geom_point(alpha = 0.5) +
ggtitle("Carat vs Price (Colored by Clarity)") +
theme_minimal()
# Convert to interactive plot
ggplotly(scatter_plot_colored)
Now you can see the relationship between price, carat, and clarity interactively! Hover over the points to discover the clarity of those shiny diamonds.
In this tutorial, you learned how to: